The notebook uses scib package as a wrapper for all integrations methods, however it doesn't allow to set method parameters and uses defaults. I strongly recommend to use the methods itself in real life, this notebook is for demonstrational purposes only.
import scib
import scanpy as sc
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from collections import Counter
plt.rcParams['figure.figsize'] = [6, 6]
sc.logging.print_header()
scanpy==1.9.3 anndata==0.8.0 umap==0.5.3 numpy==1.21.6 scipy==1.7.3 pandas==1.3.5 scikit-learn==1.0.2 statsmodels==0.13.5 python-igraph==0.10.4 louvain==0.8.0 pynndescent==0.5.7
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/umap/__init__.py:11: ImportWarning: Tensorflow not installed; ParametricUMAP will be unavailable category=ImportWarning,
!wget -O ../data/human_pancreas_norm_complexBatch.h5ad https://figshare.com/ndownloader/files/24539828
--2023-03-16 22:57:43-- https://figshare.com/ndownloader/files/24539828 Resolving figshare.com (figshare.com)... 34.240.113.119, 54.229.56.157, 2a05:d018:1f4:d000:8f2d:b761:e885:37ef, ... Connecting to figshare.com (figshare.com)|34.240.113.119|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/24539828/human_pancreas_norm_complexBatch.h5ad?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIYCQYOYV5JSSROOA/20230316/eu-west-1/s3/aws4_request&X-Amz-Date=20230316T225743Z&X-Amz-Expires=10&X-Amz-SignedHeaders=host&X-Amz-Signature=e4a8c8e9c9105c5aad05b5a8b72332d78c92e2d4dccff9c0a1e00e569d7938a2 [following] --2023-03-16 22:57:43-- https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/24539828/human_pancreas_norm_complexBatch.h5ad?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIYCQYOYV5JSSROOA/20230316/eu-west-1/s3/aws4_request&X-Amz-Date=20230316T225743Z&X-Amz-Expires=10&X-Amz-SignedHeaders=host&X-Amz-Signature=e4a8c8e9c9105c5aad05b5a8b72332d78c92e2d4dccff9c0a1e00e569d7938a2 Resolving s3-eu-west-1.amazonaws.com (s3-eu-west-1.amazonaws.com)... 52.92.16.240, 52.92.0.16, 52.218.96.98, ... Connecting to s3-eu-west-1.amazonaws.com (s3-eu-west-1.amazonaws.com)|52.92.16.240|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 315955785 (301M) [application/octet-stream] Saving to: ‘../data/human_pancreas_norm_complexBatch.h5ad’ ../data/human_pancr 100%[===================>] 301.32M 98.7MB/s in 3.1s 2023-03-16 22:57:46 (98.7 MB/s) - ‘../data/human_pancreas_norm_complexBatch.h5ad’ saved [315955785/315955785]
adata = sc.read('../data/human_pancreas_norm_complexBatch.h5ad')
adata
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_io/h5ad.py:238: OldFormatWarning: Element '/X' was written without encoding metadata.
d[k] = read_elem(f[k])
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_io/h5ad.py:238: OldFormatWarning: Element '/layers' was written without encoding metadata.
d[k] = read_elem(f[k])
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_io/specs/methods.py:92: OldFormatWarning: Element '/layers/counts' was written without encoding metadata.
return {k: read_elem(v) for k, v in elem.items()}
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_io/specs/methods.py:584: OldFormatWarning: Element '/obs/__categories/tech' was written without encoding metadata.
categories = read_elem(categories_dset)
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_io/specs/methods.py:587: OldFormatWarning: Element '/obs/tech' was written without encoding metadata.
read_elem(dataset), categories, ordered=ordered
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_io/specs/methods.py:584: OldFormatWarning: Element '/obs/__categories/celltype' was written without encoding metadata.
categories = read_elem(categories_dset)
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_io/specs/methods.py:587: OldFormatWarning: Element '/obs/celltype' was written without encoding metadata.
read_elem(dataset), categories, ordered=ordered
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_io/specs/methods.py:590: OldFormatWarning: Element '/obs/size_factors' was written without encoding metadata.
return read_elem(dataset)
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_io/specs/methods.py:590: OldFormatWarning: Element '/obs/_index' was written without encoding metadata.
return read_elem(dataset)
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_io/specs/methods.py:590: OldFormatWarning: Element '/var/_index' was written without encoding metadata.
return read_elem(dataset)
AnnData object with n_obs × n_vars = 16382 × 19093
obs: 'tech', 'celltype', 'size_factors'
layers: 'counts'
adata.X
array([[0. , 0. , 0. , ..., 0. , 0. ,
0. ],
[2.6120808 , 0. , 0. , ..., 0. , 2.6120806 ,
0. ],
[0. , 3.311074 , 0. , ..., 0. , 0. ,
0. ],
...,
[0.42996258, 2.6206095 , 0. , ..., 2.1124895 , 1.0953737 ,
1.0403827 ],
[3.4695568 , 0.64595073, 0. , ..., 0. , 0. ,
0. ],
[0.4566651 , 0.5159923 , 0. , ..., 0. , 0.53929645,
0. ]], dtype=float32)
adata.obs
| tech | celltype | size_factors | |
|---|---|---|---|
| D101_5 | celseq | gamma | 0.028492 |
| D101_43 | celseq | gamma | 0.079348 |
| D101_93 | celseq | gamma | 0.037932 |
| D102_4 | celseq | gamma | 0.047685 |
| D172444_23 | celseq | gamma | 0.038683 |
| ... | ... | ... | ... |
| Sample_1594 | smarter | gamma | 1.000000 |
| Sample_1595 | smarter | gamma | 1.000000 |
| Sample_1597 | smarter | gamma | 1.000000 |
| Sample_1598 | smarter | gamma | 1.000000 |
| Sample_1600 | smarter | gamma | 1.000000 |
16382 rows × 3 columns
adata.var
| A1BG |
|---|
| A1CF |
| A2M |
| A2ML1 |
| A4GALT |
| ... |
| ZXDC |
| ZYG11B |
| ZYX |
| ZZEF1 |
| ZZZ3 |
19093 rows × 0 columns
# counts, really?
adata.layers['counts']
array([[ 0. , 0. , 0. , ..., 0. , 0. ,
0. ],
[ 1.0019583, 0. , 0. , ..., 0. , 1.001958 ,
0. ],
[ 0. , 1.0019583, 0. , ..., 0. , 0. ,
0. ],
...,
[ 0.5372 , 12.7441 , 0. , ..., 7.2688 , 1.9903 ,
1.8303 ],
[31.1225 , 0.9078 , 0. , ..., 0. , 0. ,
0. ],
[ 0.5788 , 0.6753 , 0. , ..., 0. , 0.7148 ,
0. ]], dtype=float32)
# while "counts" are obviously not real counts and smarter should be rather RPKM than something proportional to counts, lets sti
adata.var['mt'] = adata.var_names.str.startswith('MT-') # annotate the group of mitochondrial genes as 'mt'
adata.var['rb'] = adata.var_names.str.startswith(('RPS','RPL')) # annotate the group of ribosomal proteins as 'rb'
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True,layer='counts')
sc.pp.calculate_qc_metrics(adata, qc_vars=['rb'], percent_top=None, log1p=False, inplace=True,layer='counts')
adata.obs
| tech | celltype | size_factors | n_genes_by_counts | total_counts | total_counts_mt | pct_counts_mt | total_counts_rb | pct_counts_rb | |
|---|---|---|---|---|---|---|---|---|---|
| D101_5 | celseq | gamma | 0.028492 | 1857 | 4057.480225 | 0.0 | 0.0 | 398.486694 | 9.821038 |
| D101_43 | celseq | gamma | 0.079348 | 3724 | 10604.392578 | 0.0 | 0.0 | 1138.060791 | 10.731975 |
| D101_93 | celseq | gamma | 0.037932 | 2261 | 4921.610840 | 0.0 | 0.0 | 569.862549 | 11.578781 |
| D102_4 | celseq | gamma | 0.047685 | 2653 | 6157.592773 | 0.0 | 0.0 | 601.320435 | 9.765512 |
| D172444_23 | celseq | gamma | 0.038683 | 2230 | 4803.835938 | 0.0 | 0.0 | 240.532822 | 5.007099 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Sample_1594 | smarter | gamma | 1.000000 | 5842 | 495900.125000 | 0.0 | 0.0 | 36327.195312 | 7.325506 |
| Sample_1595 | smarter | gamma | 1.000000 | 5196 | 326958.031250 | 0.0 | 0.0 | 23983.968750 | 7.335488 |
| Sample_1597 | smarter | gamma | 1.000000 | 6356 | 420117.375000 | 0.0 | 0.0 | 28299.171875 | 6.736015 |
| Sample_1598 | smarter | gamma | 1.000000 | 4170 | 410836.906250 | 0.0 | 0.0 | 30692.054688 | 7.470617 |
| Sample_1600 | smarter | gamma | 1.000000 | 4007 | 456183.281250 | 0.0 | 0.0 | 27750.597656 | 6.083212 |
16382 rows × 9 columns
# I'm not sure what is in 'counts' lets compare it with adata.X (that should be scran normalazed log + 1 transformed)
x = adata.obs['l10total_counts'] = np.log10(adata.obs.total_counts)
y = adata.X.sum(1)
for t in adata.obs.tech.unique():
f = adata.obs.tech == t
plt.scatter(x[f],y[f],s=0.5,label=t)
plt.legend(loc='upper left',fontsize=10,markerscale=10)
<matplotlib.legend.Legend at 0x7fa0094e6150>
plt.scatter(adata.obs.l10total_counts,adata.obs.pct_counts_rb,s=0.5)
<matplotlib.collections.PathCollection at 0x7fa0098bf850>
sc.pl.violin(adata, ['pct_counts_rb'], groupby = 'tech', stripplot = False,rotation=90)
sc.pl.violin(adata, ['pct_counts_rb'], groupby = 'celltype', stripplot = False,rotation=90)
Look like something non-log transformed (it is better to compare values in X and "counts" directly)
sc.pl.violin(adata, ['l10total_counts'], groupby = 'tech', stripplot = False,rotation=90)
pd.crosstab(adata.obs.tech,adata.obs.celltype)
| celltype | acinar | activated_stellate | alpha | beta | delta | ductal | endothelial | epsilon | gamma | macrophage | mast | quiescent_stellate | schwann | t_cell |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| tech | ||||||||||||||
| celseq | 228 | 19 | 191 | 161 | 50 | 327 | 5 | 1 | 18 | 1 | 1 | 1 | 1 | 0 |
| celseq2 | 274 | 90 | 843 | 445 | 203 | 258 | 21 | 4 | 110 | 15 | 6 | 12 | 4 | 0 |
| fluidigmc1 | 21 | 16 | 239 | 258 | 25 | 36 | 14 | 1 | 18 | 1 | 3 | 1 | 5 | 0 |
| inDrop1 | 110 | 51 | 236 | 872 | 214 | 120 | 130 | 13 | 70 | 14 | 8 | 92 | 5 | 2 |
| inDrop2 | 3 | 81 | 676 | 371 | 125 | 301 | 23 | 2 | 86 | 17 | 9 | 22 | 6 | 2 |
| inDrop3 | 843 | 100 | 1130 | 787 | 161 | 376 | 92 | 2 | 36 | 14 | 7 | 54 | 1 | 2 |
| inDrop4 | 2 | 52 | 284 | 495 | 101 | 280 | 7 | 1 | 63 | 10 | 1 | 5 | 1 | 1 |
| smarter | 0 | 0 | 886 | 472 | 49 | 0 | 0 | 0 | 85 | 0 | 0 | 0 | 0 | 0 |
| smartseq2 | 188 | 55 | 1008 | 308 | 127 | 444 | 21 | 8 | 213 | 7 | 7 | 6 | 2 | 0 |
adata.raw = adata
# values looks log-transformed (as expected)
z=plt.hist(adata.X.flatten())
sc.pp.highly_variable_genes(adata)
adata.var.highly_variable.value_counts()
False 15548 True 3545 Name: highly_variable, dtype: int64
sc.pl.highly_variable_genes(adata)
adata = adata[:, adata.var.highly_variable]
adata
View of AnnData object with n_obs × n_vars = 16382 × 3545
obs: 'tech', 'celltype', 'size_factors', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_rb', 'pct_counts_rb', 'l10total_counts'
var: 'mt', 'rb', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
uns: 'tech_colors', 'celltype_colors', 'hvg'
layers: 'counts'
sc.pp.scale(adata)
sc.tl.pca(adata)
sc.pl.pca_variance_ratio(adata, log=True)
sc.pp.neighbors(adata,n_neighbors=15, n_pcs=50)
sc.tl.umap(adata)
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scanpy/preprocessing/_simple.py:843: UserWarning: Received a view of an AnnData. Making a copy. view_to_actual(adata)
sc.pl.umap(adata,color=['tech', 'celltype'])
according to https://github.com/theislab/scib-reproducibility/tree/main/notebooks/data_preprocessing/pancreas the data is already normalazed and log-transformed so we will skip scib.preprocessing.normalize
#adata = adata.raw.to_adata() # something that looks like numpy bug happens below if I use 'raw', so I'll re-load the dataset
adata = sc.read('../data/human_pancreas_norm_complexBatch.h5ad')
adata
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_io/h5ad.py:238: OldFormatWarning: Element '/X' was written without encoding metadata.
d[k] = read_elem(f[k])
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_io/h5ad.py:238: OldFormatWarning: Element '/layers' was written without encoding metadata.
d[k] = read_elem(f[k])
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_io/specs/methods.py:92: OldFormatWarning: Element '/layers/counts' was written without encoding metadata.
return {k: read_elem(v) for k, v in elem.items()}
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_io/specs/methods.py:584: OldFormatWarning: Element '/obs/__categories/tech' was written without encoding metadata.
categories = read_elem(categories_dset)
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_io/specs/methods.py:587: OldFormatWarning: Element '/obs/tech' was written without encoding metadata.
read_elem(dataset), categories, ordered=ordered
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_io/specs/methods.py:584: OldFormatWarning: Element '/obs/__categories/celltype' was written without encoding metadata.
categories = read_elem(categories_dset)
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_io/specs/methods.py:587: OldFormatWarning: Element '/obs/celltype' was written without encoding metadata.
read_elem(dataset), categories, ordered=ordered
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_io/specs/methods.py:590: OldFormatWarning: Element '/obs/size_factors' was written without encoding metadata.
return read_elem(dataset)
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_io/specs/methods.py:590: OldFormatWarning: Element '/obs/_index' was written without encoding metadata.
return read_elem(dataset)
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_io/specs/methods.py:590: OldFormatWarning: Element '/var/_index' was written without encoding metadata.
return read_elem(dataset)
AnnData object with n_obs × n_vars = 16382 × 19093
obs: 'tech', 'celltype', 'size_factors'
layers: 'counts'
# scib modifies bath column at least in scib.preprocessing.scale_batch to integer, so I make a copy to have text representation
adata.obs['tech_text'] = adata.obs.tech
ccscores = scib.preprocessing.score_cell_cycle(adata,'human')
adata.obs
| tech | celltype | size_factors | tech_text | S_score | G2M_score | phase | |
|---|---|---|---|---|---|---|---|
| D101_5 | celseq | gamma | 0.028492 | celseq | 0.050221 | -0.422348 | S |
| D101_43 | celseq | gamma | 0.079348 | celseq | 0.012194 | -0.536017 | S |
| D101_93 | celseq | gamma | 0.037932 | celseq | -0.022595 | 0.015658 | G2M |
| D102_4 | celseq | gamma | 0.047685 | celseq | 0.128670 | -0.260789 | S |
| D172444_23 | celseq | gamma | 0.038683 | celseq | -0.041654 | -0.394326 | G1 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| Sample_1594 | smarter | gamma | 1.000000 | smarter | 0.252066 | -0.508501 | S |
| Sample_1595 | smarter | gamma | 1.000000 | smarter | 0.090100 | -0.388399 | S |
| Sample_1597 | smarter | gamma | 1.000000 | smarter | 0.027931 | -0.461743 | S |
| Sample_1598 | smarter | gamma | 1.000000 | smarter | 0.013514 | -0.207898 | S |
| Sample_1600 | smarter | gamma | 1.000000 | smarter | -0.198831 | -0.263792 | G1 |
16382 rows × 7 columns
adata = scib.preprocessing.hvg_batch(
adata,
batch_key='tech',
adataOut=True)
Using 65 HVGs from full intersect set Using 221 HVGs from n_batch-1 set Using 274 HVGs from n_batch-2 set Using 269 HVGs from n_batch-3 set Using 349 HVGs from n_batch-4 set Using 435 HVGs from n_batch-5 set Using 387 HVGs from n_batch-6 set Using 2000 HVGs
Does per batch HVG selection change integration?
adata_hgv = adata.copy()
sc.pp.scale(adata_hgv)
sc.tl.pca(adata_hgv)
sc.pp.neighbors(adata_hgv,n_neighbors=15, n_pcs=50)
sc.tl.umap(adata_hgv)
sc.pl.umap(adata_hgv,color=['tech', 'celltype'])
not really
#lets check per gene sd, is it scaled (per batch?)
plt.hist(np.std(adata.X,axis=0))
(array([339., 386., 476., 407., 237., 101., 36., 12., 4., 2.]),
array([0.03652171, 0.407474 , 0.7784263 , 1.1493785 , 1.5203309 ,
1.8912832 , 2.2622354 , 2.6331878 , 3.0041401 , 3.3750923 ,
3.7460446 ], dtype=float32),
<BarContainer object of 10 artists>)
plt.hist(np.std(adata.X[adata.obs.tech_text=='inDrop1',:],axis=0))
(array([263., 316., 346., 385., 256., 190., 120., 76., 37., 11.]),
array([0. , 0.28488523, 0.56977046, 0.8546556 , 1.1395409 ,
1.4244261 , 1.7093112 , 1.9941965 , 2.2790818 , 2.563967 ,
2.8488522 ], dtype=float32),
<BarContainer object of 10 artists>)
adata_scaled = scib.preprocessing.scale_batch(adata.copy(), 'tech')
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour. [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
#lets check per gene sd
z=plt.hist(adata_scaled.X.std(0))
z=plt.hist(np.std(adata_scaled.X[adata.obs.tech_text=='inDrop1',:],axis=0,ddof=1))
plt.scatter((adata_scaled.X[adata.obs.tech_text=='inDrop1',:]>0).sum(0),np.std(adata_scaled.X[adata.obs.tech_text=='inDrop1',:],axis=0,ddof=1))
<matplotlib.collections.PathCollection at 0x7f9f0bbf81d0>
sc.tl.pca(adata_scaled)
sc.pp.neighbors(adata_scaled,n_neighbors=15, n_pcs=50)
sc.tl.umap(adata_scaled)
scib.me.cluster_optimal_resolution(adata_scaled, cluster_key="cluster", label_key="celltype")
adata_scaled.obs
resolution: 0.1, nmi: 0.9156420388605553 resolution: 0.2, nmi: 0.8087496247854713 resolution: 0.3, nmi: 0.7782488166065289 resolution: 0.4, nmi: 0.7800075129110834 resolution: 0.5, nmi: 0.7797929269447642 resolution: 0.6, nmi: 0.7437295775094869 resolution: 0.7, nmi: 0.7407492749510892 resolution: 0.8, nmi: 0.7266569593288498 resolution: 0.9, nmi: 0.717991307099095 resolution: 1.0, nmi: 0.7179073336414186 resolution: 1.1, nmi: 0.7181581866119924 resolution: 1.2, nmi: 0.7018025542782351 resolution: 1.3, nmi: 0.6961133059481948 resolution: 1.4, nmi: 0.683600959680538 resolution: 1.5, nmi: 0.6867023183996811 resolution: 1.6, nmi: 0.6827121108193284 resolution: 1.7, nmi: 0.6792901584183743 resolution: 1.8, nmi: 0.6753639612668919 resolution: 1.9, nmi: 0.6692339834664677 resolution: 2.0, nmi: 0.6675011423668421 optimised clustering against celltype optimal cluster resolution: 0.1 optimal score: 0.9156420388605553
| tech | celltype | size_factors | tech_text | S_score | G2M_score | phase | cluster | |
|---|---|---|---|---|---|---|---|---|
| D101_5 | 0 | gamma | 0.028492 | celseq | 0.050221 | -0.422348 | S | 5 |
| D101_43 | 0 | gamma | 0.079348 | celseq | 0.012194 | -0.536017 | S | 5 |
| D101_93 | 0 | gamma | 0.037932 | celseq | -0.022595 | 0.015658 | G2M | 5 |
| D102_4 | 0 | gamma | 0.047685 | celseq | 0.128670 | -0.260789 | S | 5 |
| D172444_23 | 0 | gamma | 0.038683 | celseq | -0.041654 | -0.394326 | G1 | 5 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Sample_1594 | 7 | gamma | 1.000000 | smarter | 0.252066 | -0.508501 | S | 5 |
| Sample_1595 | 7 | gamma | 1.000000 | smarter | 0.090100 | -0.388399 | S | 5 |
| Sample_1597 | 7 | gamma | 1.000000 | smarter | 0.027931 | -0.461743 | S | 5 |
| Sample_1598 | 7 | gamma | 1.000000 | smarter | 0.013514 | -0.207898 | S | 5 |
| Sample_1600 | 7 | gamma | 1.000000 | smarter | -0.198831 | -0.263792 | G1 | 5 |
16382 rows × 8 columns
Even without integration the umap is much better than before - simply because per-batch scaling and per-batch HGV selection (i think that the former is more important)
sc.pl.umap(adata_scaled,color=['tech', 'celltype','cluster'])
adata_bbknn = scib.ig.bbknn(adata.copy(), batch="tech")
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/bbknn/__init__.py:294: FutureWarning: This location for 'distances' is deprecated. It has been moved to .obsp[distances], and will not be accesible here in a future version of anndata. adata.uns['neighbors']['distances'] = bbknn_out[0] /home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/bbknn/__init__.py:295: FutureWarning: This location for 'connectivities' is deprecated. It has been moved to .obsp[connectivities], and will not be accesible here in a future version of anndata. adata.uns['neighbors']['connectivities'] = bbknn_out[1]
adata_bbknn
AnnData object with n_obs × n_vars = 16382 × 2000
obs: 'tech', 'celltype', 'size_factors', 'tech_text', 'S_score', 'G2M_score', 'phase'
var: 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
uns: 'hvg', 'pca', 'neighbors'
obsm: 'X_pca'
varm: 'PCs'
layers: 'counts'
obsp: 'distances', 'connectivities'
sc.tl.umap(adata_bbknn)
scib.me.cluster_optimal_resolution(adata_bbknn, cluster_key="cluster", label_key="celltype")
adata_bbknn.obs
resolution: 0.1, nmi: 0.8938782286420873 resolution: 0.2, nmi: 0.9120911810470724 resolution: 0.3, nmi: 0.9113854914701742 resolution: 0.4, nmi: 0.9115756130814252 resolution: 0.5, nmi: 0.9103945363254168 resolution: 0.6, nmi: 0.8386350931872477 resolution: 0.7, nmi: 0.835770184640433 resolution: 0.8, nmi: 0.8241136119306443 resolution: 0.9, nmi: 0.8261523501912349 resolution: 1.0, nmi: 0.799304853457019 resolution: 1.1, nmi: 0.7986304510002359 resolution: 1.2, nmi: 0.7676323533221089 resolution: 1.3, nmi: 0.746109895443401 resolution: 1.4, nmi: 0.7345261484851737 resolution: 1.5, nmi: 0.7338798749257783 resolution: 1.6, nmi: 0.7332698757059296 resolution: 1.7, nmi: 0.7340952710155013 resolution: 1.8, nmi: 0.7341183110681136 resolution: 1.9, nmi: 0.7381410709400347 resolution: 2.0, nmi: 0.7387541356450916 optimised clustering against celltype optimal cluster resolution: 0.2 optimal score: 0.9120911810470724
| tech | celltype | size_factors | tech_text | S_score | G2M_score | phase | cluster | |
|---|---|---|---|---|---|---|---|---|
| D101_5 | celseq | gamma | 0.028492 | celseq | 0.050221 | -0.422348 | S | 5 |
| D101_43 | celseq | gamma | 0.079348 | celseq | 0.012194 | -0.536017 | S | 5 |
| D101_93 | celseq | gamma | 0.037932 | celseq | -0.022595 | 0.015658 | G2M | 5 |
| D102_4 | celseq | gamma | 0.047685 | celseq | 0.128670 | -0.260789 | S | 5 |
| D172444_23 | celseq | gamma | 0.038683 | celseq | -0.041654 | -0.394326 | G1 | 5 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Sample_1594 | smarter | gamma | 1.000000 | smarter | 0.252066 | -0.508501 | S | 5 |
| Sample_1595 | smarter | gamma | 1.000000 | smarter | 0.090100 | -0.388399 | S | 5 |
| Sample_1597 | smarter | gamma | 1.000000 | smarter | 0.027931 | -0.461743 | S | 5 |
| Sample_1598 | smarter | gamma | 1.000000 | smarter | 0.013514 | -0.207898 | S | 5 |
| Sample_1600 | smarter | gamma | 1.000000 | smarter | -0.198831 | -0.263792 | G1 | 5 |
16382 rows × 8 columns
sc.pl.umap(adata_bbknn,color=['tech', 'celltype','cluster'])
adata_combat = scib.ig.combat(adata.copy(), batch="tech")
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py:340: RuntimeWarning: divide by zero encountered in true_divide (abs(g_new - g_old) / g_old).max(), (abs(d_new - d_old) / d_old).max()
adata_combat
AnnData object with n_obs × n_vars = 16382 × 2000
obs: 'tech', 'celltype', 'size_factors', 'tech_text', 'S_score', 'G2M_score', 'phase'
var: 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
uns: 'hvg'
layers: 'counts'
d = np.log10(np.abs((adata_combat.X - adata.X)))
z=plt.hist(d.flatten(),100)
sc.tl.pca(adata_combat)
sc.pp.neighbors(adata_combat,n_neighbors=15, n_pcs=50)
sc.tl.umap(adata_combat)
scib.me.cluster_optimal_resolution(adata_combat, cluster_key="cluster", label_key="celltype")
adata_combat.obs
resolution: 0.1, nmi: 0.9158658201196465 resolution: 0.2, nmi: 0.899382640262332 resolution: 0.3, nmi: 0.83196373802792 resolution: 0.4, nmi: 0.76320671294178 resolution: 0.5, nmi: 0.7631288504212965 resolution: 0.6, nmi: 0.7584723375194842 resolution: 0.7, nmi: 0.7600963027694647 resolution: 0.8, nmi: 0.7306378120339491 resolution: 0.9, nmi: 0.7197168480897322 resolution: 1.0, nmi: 0.6956475404897364 resolution: 1.1, nmi: 0.6897456369102555 resolution: 1.2, nmi: 0.6808475705042671 resolution: 1.3, nmi: 0.6671584210661791 resolution: 1.4, nmi: 0.6657325207694842 resolution: 1.5, nmi: 0.663576987670048 resolution: 1.6, nmi: 0.6639149105319587 resolution: 1.7, nmi: 0.6628713365889642 resolution: 1.8, nmi: 0.6684306005842009 resolution: 1.9, nmi: 0.6602203585779869 resolution: 2.0, nmi: 0.6703617320809744 optimised clustering against celltype optimal cluster resolution: 0.1 optimal score: 0.9158658201196465
| tech | celltype | size_factors | tech_text | S_score | G2M_score | phase | cluster | |
|---|---|---|---|---|---|---|---|---|
| D101_5 | celseq | gamma | 0.028492 | celseq | 0.050221 | -0.422348 | S | 5 |
| D101_43 | celseq | gamma | 0.079348 | celseq | 0.012194 | -0.536017 | S | 5 |
| D101_93 | celseq | gamma | 0.037932 | celseq | -0.022595 | 0.015658 | G2M | 5 |
| D102_4 | celseq | gamma | 0.047685 | celseq | 0.128670 | -0.260789 | S | 5 |
| D172444_23 | celseq | gamma | 0.038683 | celseq | -0.041654 | -0.394326 | G1 | 5 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Sample_1594 | smarter | gamma | 1.000000 | smarter | 0.252066 | -0.508501 | S | 5 |
| Sample_1595 | smarter | gamma | 1.000000 | smarter | 0.090100 | -0.388399 | S | 5 |
| Sample_1597 | smarter | gamma | 1.000000 | smarter | 0.027931 | -0.461743 | S | 0 |
| Sample_1598 | smarter | gamma | 1.000000 | smarter | 0.013514 | -0.207898 | S | 5 |
| Sample_1600 | smarter | gamma | 1.000000 | smarter | -0.198831 | -0.263792 | G1 | 5 |
16382 rows × 8 columns
sc.pl.umap(adata_combat,color=['tech', 'celltype','cluster'])
adata_scanorama = scib.ig.scanorama(adata.copy(), batch="tech")
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/intervaltree/intervaltree.py:37: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working class IntervalTree(collections.MutableSet):
Found 2000 genes among all datasets [[0. 0.67221007 0.04388715 0.15039841 0.12051793 0.01394422 0.05278884 0.06099196 0.25697211] [0. 0. 0.10188088 0.09654104 0.02726218 0.00437637 0.00690714 0.03619303 0.60962801] [0. 0. 0. 0.00940439 0.00783699 0.02507837 0.0031348 0.69278997 0.10971787] [0. 0. 0. 0. 0.52142488 0.62467734 0.32310054 0.02374806 0.03665462] [0. 0. 0. 0. 0. 0.27842227 0.69837587 0.00348028 0.11136891] [0. 0. 0. 0. 0. 0. 0.43591711 0.01407507 0.02552011] [0. 0. 0. 0. 0. 0. 0. 0.07444359 0.25863392] [0. 0. 0. 0. 0. 0. 0. 0. 0.6038874 ] [0. 0. 0. 0. 0. 0. 0. 0. 0. ]] Processing datasets (4, 6) Processing datasets (2, 7) Processing datasets (0, 1) Processing datasets (3, 5) Processing datasets (1, 8) Processing datasets (7, 8) Processing datasets (3, 4) Processing datasets (5, 6) Processing datasets (3, 6) Processing datasets (4, 5) Processing datasets (6, 8) Processing datasets (0, 8) Processing datasets (0, 3) Processing datasets (0, 4) Processing datasets (4, 8) Processing datasets (2, 8) Processing datasets (1, 2)
/home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/scanorama/scanorama.py:237: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour. adata = AnnData(datasets[i]) /home/jovyan/my-conda-envs/scib-pipeline-R4.0/lib/python3.7/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour. [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
adata_scanorama
AnnData object with n_obs × n_vars = 16382 × 2000
obs: 'tech', 'celltype', 'size_factors', 'tech_text', 'S_score', 'G2M_score', 'phase'
var: 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
obsm: 'X_scanorama', 'X_emb'
d = np.asarray(np.log10(np.abs((adata_scanorama.X.todense() - adata.X))))
z=plt.hist(d.flatten(),100)
sc.tl.pca(adata_scanorama)
sc.pp.neighbors(adata_scanorama,n_neighbors=15, n_pcs=50)
sc.tl.umap(adata_scanorama)
scib.me.cluster_optimal_resolution(adata_scanorama, cluster_key="cluster", label_key="celltype")
adata_scanorama.obs
resolution: 0.1, nmi: 0.8253553818911714 resolution: 0.2, nmi: 0.7872013545169586 resolution: 0.3, nmi: 0.7747808598675352 resolution: 0.4, nmi: 0.7713761198469122 resolution: 0.5, nmi: 0.7458968352116856 resolution: 0.6, nmi: 0.7458109175375651 resolution: 0.7, nmi: 0.7240065390358 resolution: 0.8, nmi: 0.7140047944357538 resolution: 0.9, nmi: 0.7144587895603192 resolution: 1.0, nmi: 0.7085816654419799 resolution: 1.1, nmi: 0.6986495162508116 resolution: 1.2, nmi: 0.7049137571604153 resolution: 1.3, nmi: 0.6924411934358862 resolution: 1.4, nmi: 0.694843716112199 resolution: 1.5, nmi: 0.691876692756587 resolution: 1.6, nmi: 0.6779308555148055 resolution: 1.7, nmi: 0.676184726199782 resolution: 1.8, nmi: 0.6750253861696504 resolution: 1.9, nmi: 0.6670905140046902 resolution: 2.0, nmi: 0.665233378393305 optimised clustering against celltype optimal cluster resolution: 0.1 optimal score: 0.8253553818911714
| tech | celltype | size_factors | tech_text | S_score | G2M_score | phase | cluster | |
|---|---|---|---|---|---|---|---|---|
| D101_5 | celseq | gamma | 0.028492 | celseq | 0.050221 | -0.422348 | S | 7 |
| D101_43 | celseq | gamma | 0.079348 | celseq | 0.012194 | -0.536017 | S | 7 |
| D101_93 | celseq | gamma | 0.037932 | celseq | -0.022595 | 0.015658 | G2M | 7 |
| D102_4 | celseq | gamma | 0.047685 | celseq | 0.128670 | -0.260789 | S | 7 |
| D172444_23 | celseq | gamma | 0.038683 | celseq | -0.041654 | -0.394326 | G1 | 7 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| HP1504901_A22 | smartseq2 | quiescent_stellate | 2.533755 | smartseq2 | 0.257108 | -0.332051 | S | 8 |
| HP1504901_M11 | smartseq2 | quiescent_stellate | 2.211327 | smartseq2 | 0.005342 | -0.155503 | S | 8 |
| HP1504901_N21 | smartseq2 | quiescent_stellate | 1.586724 | smartseq2 | -0.102090 | -0.291845 | G1 | 8 |
| HP1507101_P15 | smartseq2 | quiescent_stellate | 1.716191 | smartseq2 | -0.020635 | -0.413889 | G1 | 8 |
| HP1508501T2D_M16 | smartseq2 | quiescent_stellate | 1.149515 | smartseq2 | 0.024129 | -0.302739 | S | 8 |
16382 rows × 8 columns
sc.pl.umap(adata_scanorama,color=['tech', 'celltype','cluster'])
paper = pd.read_csv('https://raw.githubusercontent.com/theislab/scib-reproducibility/main/data/metrics.csv')
paper.iloc[:,0].values[:4]
array(['/pancreas/metrics/unscaled/hvg/mnn_full',
'/pancreas/metrics/unscaled/full_feature/mnn_full',
'/lung_atlas/metrics/unscaled/hvg/mnn_full',
'/lung_atlas/metrics/unscaled/full_feature/mnn_full'], dtype=object)
x = paper.iloc[:,0].str.split('/')
paper['dataset'] = [y[1] for y in x]
paper['scaled'] = [y[3] for y in x]
paper['features'] = [y[4] for y in x]
paper['name'] = [y[5] for y in x]
paper
| Unnamed: 0 | NMI_cluster/label | ARI_cluster/label | ASW_label | ASW_label/batch | PCR_batch | cell_cycle_conservation | isolated_label_F1 | isolated_label_silhouette | graph_conn | kBET | iLISI | cLISI | hvg_overlap | trajectory | dataset | scaled | features | name | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | /pancreas/metrics/unscaled/hvg/mnn_full | 0.913867 | 0.951113 | 0.617242 | 0.905702 | 0.855878 | 0.470498 | 0.107692 | 0.520902 | 0.971955 | 0.319771 | 7.924053e-02 | 1.000000 | 0.477221 | NaN | pancreas | unscaled | hvg | mnn_full |
| 1 | /pancreas/metrics/unscaled/full_feature/mnn_full | 0.632615 | 0.262193 | 0.564448 | 0.844820 | 0.712545 | 0.741364 | 0.106870 | 0.550404 | 0.944990 | 0.218333 | 4.064868e-03 | 0.999384 | 0.202589 | NaN | pancreas | unscaled | full_feature | mnn_full |
| 2 | /lung_atlas/metrics/unscaled/hvg/mnn_full | 0.722076 | 0.480512 | 0.577154 | 0.906871 | 0.674971 | 0.791231 | 0.775781 | 0.523085 | 0.950466 | 0.238778 | 5.390896e-02 | 0.997466 | 0.309707 | NaN | lung_atlas | unscaled | hvg | mnn_full |
| 3 | /lung_atlas/metrics/unscaled/full_feature/mnn_... | 0.722318 | 0.446907 | 0.527266 | 0.892211 | 0.469516 | 0.937218 | 0.693204 | 0.582459 | 0.983901 | 0.236504 | 1.933887e-02 | 0.998204 | 0.149390 | NaN | lung_atlas | unscaled | full_feature | mnn_full |
| 4 | /immune_cell_hum/metrics/unscaled/hvg/mnn_full | 0.772948 | 0.561214 | 0.586781 | 0.897345 | 0.901297 | 0.772181 | 0.817978 | 0.567297 | 0.991186 | 0.139963 | 1.515592e-01 | 0.996077 | 0.382497 | 0.857006 | immune_cell_hum | unscaled | hvg | mnn_full |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 460 | /mouse_brain/metrics/unscaled/hvg/desc_embed | 0.460478 | 0.083753 | 0.652059 | 0.335802 | 0.799418 | 0.542941 | 0.960084 | 0.542539 | 0.923708 | NaN | 4.914801e-02 | 1.000000 | NaN | NaN | mouse_brain | unscaled | hvg | desc_embed |
| 461 | /mouse_brain/metrics/unscaled/full_feature/des... | 0.463906 | 0.083768 | 0.647280 | 0.326988 | 0.866849 | 0.531075 | 0.939191 | 0.535082 | 0.863886 | NaN | 1.036208e-15 | 1.000000 | NaN | NaN | mouse_brain | unscaled | full_feature | desc_embed |
| 462 | /mouse_brain/metrics/scaled/hvg/desc_embed | 0.457710 | 0.094341 | 0.659603 | 0.340742 | 0.821173 | 0.520832 | 0.956741 | 0.535077 | 0.895446 | NaN | 5.502357e-02 | 1.000000 | NaN | NaN | mouse_brain | scaled | hvg | desc_embed |
| 463 | /mouse_brain/metrics/scaled/full_feature/desc_... | 0.458128 | 0.076538 | 0.636085 | 0.313182 | 0.818394 | 0.497144 | 0.938700 | 0.521688 | 0.825321 | NaN | 6.661338e-16 | 1.000000 | NaN | NaN | mouse_brain | scaled | full_feature | desc_embed |
| 464 | /mouse_brain/metrics/unscaled/full_feature/uni... | 0.534594 | 0.207975 | 0.601672 | 0.535152 | 0.000000 | 0.223630 | 0.937341 | 0.551809 | 0.867779 | NaN | 0.000000e+00 | 1.000000 | NaN | NaN | mouse_brain | unscaled | full_feature | unintegrated_full |
465 rows × 19 columns
paper.features.value_counts()
full_feature 236 hvg 229 Name: features, dtype: int64
paper.name.value_counts()
scanorama_full 28 saucie_full 28 saucie_embed 28 desc_embed 28 harmony_embed 28 bbknn_knn 28 scanorama_embed 28 fastmnn_full 28 conos_knn 28 combat_full 28 fastmnn_embed 28 mnn_full 24 scgen_full 24 seurat_full 24 seuratrpca_full 24 scvi_embed 14 liger_embed 14 scanvi_embed 14 trvae_embed 12 unintegrated_full 7 Name: name, dtype: int64
paper_metrics = paper.loc[paper.dataset.isin(['pancreas']) &
((paper.name.isin(['bbknn_knn','combat_full','unintegrated_full','scanorama_full']) &
paper.features.isin(['hvg'])& paper.scaled.isin(['scaled'])) |
paper.name.isin(['unintegrated_full'])),:]
paper_metrics = paper_metrics.iloc[[3,1,2,0],:]
paper_metrics
| Unnamed: 0 | NMI_cluster/label | ARI_cluster/label | ASW_label | ASW_label/batch | PCR_batch | cell_cycle_conservation | isolated_label_F1 | isolated_label_silhouette | graph_conn | kBET | iLISI | cLISI | hvg_overlap | trajectory | dataset | scaled | features | name | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 408 | /pancreas/metrics/unscaled/full_feature/uninte... | 0.594300 | 0.220807 | 0.499072 | 0.770229 | 0.000000 | 1.000000 | 0.108527 | 0.495942 | 0.796101 | 0.138187 | 1.390000e-16 | 0.999165 | 1.000000 | NaN | pancreas | unscaled | full_feature | unintegrated_full |
| 216 | /pancreas/metrics/scaled/hvg/bbknn_knn | 0.861094 | 0.915275 | NaN | NaN | NaN | NaN | 0.088608 | NaN | 0.993727 | 0.508586 | 4.046644e-01 | 0.663050 | NaN | NaN | pancreas | scaled | hvg | bbknn_knn |
| 288 | /pancreas/metrics/scaled/hvg/combat_full | 0.912217 | 0.942302 | 0.627786 | 0.862917 | 1.000000 | 0.643196 | 0.128440 | 0.816237 | 0.985943 | 0.342787 | 9.636334e-02 | 1.000000 | 0.249845 | NaN | pancreas | scaled | hvg | combat_full |
| 60 | /pancreas/metrics/scaled/hvg/scanorama_full | 0.896182 | 0.934523 | 0.647533 | 0.716389 | 0.917091 | 0.446663 | 0.555556 | 0.668001 | 0.957474 | 0.287858 | 3.260134e-01 | 0.999570 | 0.310140 | NaN | pancreas | scaled | hvg | scanorama_full |
adata_combat.obs
| tech | celltype | size_factors | tech_text | S_score | G2M_score | phase | cluster | |
|---|---|---|---|---|---|---|---|---|
| D101_5 | celseq | gamma | 0.028492 | celseq | 0.050221 | -0.422348 | S | 5 |
| D101_43 | celseq | gamma | 0.079348 | celseq | 0.012194 | -0.536017 | S | 5 |
| D101_93 | celseq | gamma | 0.037932 | celseq | -0.022595 | 0.015658 | G2M | 5 |
| D102_4 | celseq | gamma | 0.047685 | celseq | 0.128670 | -0.260789 | S | 5 |
| D172444_23 | celseq | gamma | 0.038683 | celseq | -0.041654 | -0.394326 | G1 | 5 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Sample_1594 | smarter | gamma | 1.000000 | smarter | 0.252066 | -0.508501 | S | 5 |
| Sample_1595 | smarter | gamma | 1.000000 | smarter | 0.090100 | -0.388399 | S | 5 |
| Sample_1597 | smarter | gamma | 1.000000 | smarter | 0.027931 | -0.461743 | S | 0 |
| Sample_1598 | smarter | gamma | 1.000000 | smarter | 0.013514 | -0.207898 | S | 5 |
| Sample_1600 | smarter | gamma | 1.000000 | smarter | -0.198831 | -0.263792 | G1 | 5 |
16382 rows × 8 columns
integrated = {'no':adata_hgv,'scaled':adata_scaled,'bbknn':adata_bbknn,'combat':adata_combat,'scanorama':adata_scanorama}
%%capture
allmetrics = {k:scib.metrics.metrics(adata, integrated[k],batch_key="tech_text", label_key="celltype",organism='human',n_cores=8,
ari_=True,
nmi_=True,
clisi_=True,
ilisi_=True,
cell_cycle_=True,
silhouette_=True,
kBET_=True,
graph_conn_=True,
isolated_labels_f1_=True) for k in integrated.keys()}
pd.concat(allmetrics, axis=1)
{k:scib.me.ari(integrated[k], cluster_key="cluster", label_key="celltype") for k in integrated.keys()}
paper_metrics.loc[:,['ARI_cluster/label','name']]
{k:scib.me.nmi(integrated[k], cluster_key="cluster", label_key="celltype") for k in integrated.keys()}
paper_metrics.loc[:,['NMI_cluster/label','name']]
{k:scib.me.clisi_graph(integrated[k],label_key="celltype",type_='knn',k0=15) for k in integrated.keys()}
paper_metrics.loc[:,['cLISI','name']]
{k:scib.me.ilisi_graph(integrated[k],batch_key="tech",type_='knn',k0=15) for k in integrated.keys()}
paper_metrics.loc[:,['iLISI','name']]
I'm note really understand what is going on here :) I use umap as ebmedding, that is probably not correct (should be pca smth else with more dimentions)
{k:scib.me.cell_cycle(adata,integrated[k], batch_key="tech", embed="X_umap",organism='human',recompute_cc=False) for k in integrated.keys()}
paper_metrics.loc[:,['cell_cycle_conservation','name']]